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Abstract 

For some well-known games, such as the Traveler's Dilemma or the Cen- 
tipede Game, traditional game-theoretic solution concepts — and most notably 
Nash equilibrium — predict outcomes that are not consistent with empirical ob- 
servations. In this paper, we introduce a new solution concept, iterated regret 
minimization, which exhibits the same qualitative behavior as that observed in 
experiments in many games of interest, including Traveler's Dilemma, the Cen- 
tipede Game, Nash bargaining, and Bertrand competition. As the name suggests, 
iterated regret minimization involves the iterated deletion of strategies that do 
not minimize regret. 

Keywords: Alternative solution concepts, regret minimization. 



*First draft from October 2007. We thank Geir Asheim, Sergei Izmalkov, Adam and Ehud Kalai, 
Silvio Micali, Henry Schneider, Kare Vernby, and seminar participants at GAMES 2008 for helpful 
discussions. Halpern is supported in part by NSF under grants ITR-0325453, HS-0534064, and HS- 
0812045, and AFOSR. Pass is supported in part by an NSF CAREER Award CCF-0746990, AFOSR 
Award FA9550-08-1-0197, and BSF Grant 2006317. 



1 Introduction 



Perhaps the most common solution concept considered in game theory is Nash equihb- 
rium. Various refinements of Nash equihbrium have been considered, such as sequen- 
tial equilibrium [Kreps and Wilson 1982] , perfect equilibrium |Selten 1975j . and ratio- 
nalizability [Bernheim 1984[ IPearce 1984j : see [Osborne and Rubinstein 1994] for an 
overview. There are, however, games where none of these concepts seems appropriate. 

Consider the well-known Traveler's Dilemma |Basu 1994tlBasu 2007] . Suppose that 
two travelers have identical luggage, for which they both paid the same price. Their 
luggage is damaged (in an identical way) by an airline. The airline offers to recompense 
them for their luggage. They may ask for any dollar amount between $2 and $100. 
There is only one catch. If they ask for the same amount, then that is what they will 
both receive. However, if they ask for different amounts — say one asks for %m and the 
other for $m', with m < m' — then whoever asks for %m (the lower amount) will get 
$(m + p), while the other traveler will get $(m — p), where p can be viewed as a reward 
for the person who asked for the lower amount, and a penalty for the person who asked 
for the higher amount. 

It seems at first blush that both travelers should ask for $100, the maximum amount, 
for then they will both get that. The problem is that one of them might then realize 
that he is actually better off asking for $99 if the other traveler asks for $100, since he 
then gets $101. In fact, $99 weakly dominates $100, in that no matter what Traveler 1 
asks for. Traveler 2 is always at least as well off asking for $99 than $100, and in one 
case (if Traveler 2 asks for $100) Traveler 1 is strictly better off asking for $99. Thus, 
it seems we can eliminate 100 as an amount to ask for. However, if we eliminate 100, a 
similar argument shows that 98 weakly dominates 99! And once we eliminate 99, then 
97 weakly dominates 98. Continuing this argument (technically, doing iterated deletion 
of weakly dominated strategies) both travelers end up asking for $2! In fact, it is easy 
to see that (2,2) is the only Nash equilibrium. With any other pair of requests, at least 
one of the travelers would want to change his request if he knew what the other traveler 
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was asking. Since (2,2) the only Nash equihbrium, it is also the only sequential and 
perfect equilibrium. Moreover, it is the only rationalizable strategy profile. (It is not 
necessary to understand these solution concepts in detail; the only point we are trying 
make here is that all standard solution concepts lead to (2,2).) 

This seems like a strange result. It seems that no reasonable person — even a game 
theorist! — would ever play 2. Indeed, when the Traveler's Dilemma was empirically 
tested among game theorists (with p = 2) they typically did not play anywhere close 
to 2. Becker, Carter, and Naeve [2005j asked members of the Game Theory Society 
(presumably, all experts in game theory) to submit a strategy for the game. Fifty-one 
of them did so. Of the 45 that submitted pure strategies, 33 submitted a strategy 
of 95 or higher, and 38 submitted a strategy of 90 or higher; only 3 submitted the 
"recommended" strategy of 2. The strategy that performed best (in pairwise matchups 
against all submitted strategies) was 97, which had an average payoff of $85.09. The 
worst average payoff went to those who played 2; it was only $3.92. 

Another sequence of experiments by Capra et al. |1999] showed, among other things, 
that this result was quite sensitive to the choice of p. For low values of p, people tended 
to play high values, and keep playing them when the game was repeated. By way of 
contrast, for high values of p, people started much lower, and converged to playing 2 
after a few rounds of repeated play. The standard solution concepts (Nash equilibrium, 
rationalizability, etc.) are all insensitive to the choice of p; for example, (2,2) is the 
only Nash equilibrium for all choices of p > 1 . 

In this paper, we introduce a new solution concept, iterated regret minimization, 
which has the same qualitative behavior as that observed in the experiments, not 
just in Traveler's Dilemma, but in many other games that have proved problematic 
for Nash equilibrium, including the Centipede Game, Nash bargaining, and Bertrand 
competition. In this paper, we focus on iterated regret minimization in strategic games, 
and comment on how it can be applied to Bayesian games. 

The rest of this paper is organized as follows. Section [2] contains preliminaries. 
Section [3] is the heart of the paper: we first define iterated regret minimization in 
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strategic games, provide an epistemic characterization of it, and then show how iter- 
ated regret minimization works in numerous standard examples from the game theory 
hterature, including Traveler's Dilemma, Prisoner's Dilemma, the Centipede Game, 
Bertrand Competition, and Nash Bargaining, both with pure strategies and mixed 
strategies. The epistemic characterization, like those of many other solution concepts, 
involves higher and higher levels of belief regarding other players' rationality, it does 
not involve common knowledge or common belief. Rather, higher levels of beliefs are 
accorded lower levels of likelihood. In Sections H] and [SI we briefly consider regret mini- 
mization in Bayesian games and in the context of mechanism design. We discuss related 
work in Section [6l and conclude in Section [3 Proofs are relegated to the appendix. 

2 Preliminaries 

We refer to a collection of values, one for each player, as a profile. If player j's value is 
Xj, then the resulting profile is denoted {xj)j^[n], or simply (xj) or x, if the set of players 
is clear from the context. Given a profile x, let x^i denote the collection consisting of 
all values Xj for j ^ i. It is sometimes convenient to denote the profile x as (a;j,a;_j). 

A strategic game in normal jorm is a "single-shot" game, where each player i chooses 
an action from a space Ai of actions. For simplicity, we restrict our attention to finite 
games — i.e., games where the set Ai is finite. Let A = x . . . x A„ be the set of action 
profiles. A strategic game is characterized by a tuple -u), where [n] is the set of 

players, A is the set of action profiles, and u is the profile of utility functions, where 
Mj(a) is player z's utility or payoff if the action profile a is played. A (mixed) strategy for 
player z is a probability distribution (Xj G A{Ai) (where, as usual, we denote by A(X) 
the set of distributions on the set X). Let Sj = A{Ai) denote the mixed strategies 
for player i in game G, and let S = Si x ■ ■ • x S„ denote the set of mixed strategy 
profiles. Note that, in strategy profiles in S, players are randomizing independently. 
A pure strategy for player i is a strategy for i that assigns probability 1 to a single 
action. To simplify notation, we let an action G Ai also denote the pure strategy 
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(Tj G A{Ai) which puts weight only on Oj. If a is a strategy for player i then a{a) denotes 
the probability given to action a by strategy a. Given a strategy profile a, player i's 
expected utility if a is played, denoted Ui{a), is Epr where the expectation is taken 
with respect to the probability Pr induced by a (where the players are assumed to 
choose their actions independently). 



3 Iterated Regret Minimization in Strategic Games 

We start by providing an informal discussion of iterated regret minimization in strate- 
gic games, and applying it to the Traveler's Dilemma; we then give a more formal 
treatment. 

Nash equilibrium implicitly assumes that the players know what strategy the other 
players are using. (See [Aumann and Brandenburger 1995 for a discussion of the 



knowledge required for Nash equilibrium.) Such knowledge seems unreasonable, es- 
pecially in one-shot games. Regret minimization is one way of trying to capture the 
intuition that a player wants to do well no matter what the other players do. 

The idea of minimizing regret was introduced (independently) in decision theory by 
Savage |1951j and Niehans [1948] . To explain how we use it in a game-theoretic context, 
we first review how it works in a single-agent decision problem. Suppose that an agent 
chooses an act from a set A of acts. The agent is uncertain as to the true state of the 
world; there is a set S of possible states. Associated with each state s G S* and act a & A 
is the utility u{a, s) of performing act a if s is the true state of the world. For simplicity, 
we take S and A to be finite here. The idea behind the minimax regret rule is to hedge 
the agent's bets, by doing reasonably well no matter what the actual state is. For each 
state s, let u*{s) be the best outcome in state s; that is, u*{s) = max^g^ u(a, s). The 
regret of a in state s, denoted regret^{a, s), is u*{s) — u{a,s); that is, the regret of 
a in s is the difference between the utility of the best possible outcome in s and the 
utility of performing act a in s. Let regret^{a) = max^gg reg'rei„(a, s). For example, if 
regret j^{a) = 2, then in each state s, the utility of performing a in s is guaranteed to 
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be within 2 of the utihty of any act the agent could choose, even if she knew that the 
actual state was s. The minimax-regret decision rule orders acts by their regret; the 
"best" act is the one that minimizes regret. Intuitively, this rule is trying to minimize 
the regret that an agent would feel if she discovered what the situation actually was: the 
"I wish I had chosen a' instead of a" feeling. 

Despite having been used in decision making for over 50 years, up until recently, 
there seems to have been no attempt to apply regret minimization in the context of 
game theory. We discuss other recent work on applying regret minimization to game 
theory in Section [6l here, we describe our own approach. For ease of exposition, we 
start by explaining it in the context of the Traveler's Dilemma. We take the acts for 
one player to be that player's pure strategy choices and take the states to be the other 
player's pure strategy choices. Each act-state pair is then just a strategy profile; the 
utility of the act-state pair for player i is just the payoff to player i of the strategy 
profile. Intuitively, each agent is uncertain about what the other agent will do, and is 
trying to choose an act that will minimize his regret, given that uncertainty. 

It is easy to see that, if the penalty /reward p < 49, then the acts that minimize 
regret are the ones in the interval [100 — 2p, 100]; the regret for all these acts is 2p — 1. 
For if the other player asks for $m < 100 — 2p, then the best response is to ask for 
$m — 1, which results in a payoff of $m — 1 + p, while asking for m' G [100 — 2p, 100] 
results in a payoff oi m — p, so the regret is 2]? — 1. The regret may be less (but will not 
be more) if the other player asks for an amount m G [100 — 2p, 100]. It is also easy to 
see that every other strategy has higher regret. For example, if the other player plays 
100, then the best response is 99; the regret of someone who chooses m < 99 is 99 — m, 
so if m < 100 — 2p, then the regret is greater than 2p—l. On the other hand, if p > 50, 
then the unique act that minimizes regret is asking for $2. 

Suppose that p < 50. Applying regret minimization once suggests that we consider 
a strategy in the interval [100 — 2p, 100]. But we can iterate this process. If we assume 
that both players use a strategy in this interval, then the strategy that minimizes regret 
is that of asking for $(100 — 2p+ 1). A straightforward check shows that this has regret 
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2p — 2; all other strategies have regret 2p — 1. In the special case that p = 2, this 
approach singles out the strategy of asking for $97, which was found to be the best 
strategy by Becker, Carter, and Naeve |2005] . 

As p increases, the act that survives this iterated deletion process goes down, reach- 
ing 2 if p > 50. This matches, at a qualitative level, the findings of Capra et al. (1999j FI 

3.1 Deletion Operators and Iterated Regret Minimization 

Iterated regret minimization proceeds much like other notions of iterated deletion. To 
put it in context, we first abstract the notion of iterated deletion. 

Let G = {[n],A,u) be a strategic game. We define iterated regret minimization in 
a way that makes it clear how it relates to other solution concepts based on iterated 
deletion. A deletion operator V maps sets 5 = 5i x • ■ ■ x iS„ of strategy profiles in G to 
sets of strategy profiles such that Vi^S) C S. Moreover, V{S) = Vii^S) x • ■ ■ x Vn{S), 
where Vi maps sets of strategy profiles to strategies for player i. Intuitively, T>i{S) is 
the set of strategies for player i that survive deletion, given that we start with S. Note 
that the set of strategies that survive deletion may depend on the set that we start 
with. Iterated deletion then amounts applying the V operator repeatedly, starting with 
an appropriate initial set Sq of strategies, where Sq is typically either the set of pure 
strategy profiles (i.e., action profiles) in G or the set of mixed strategy profiles in G. 

Definition 3.1 Given a deletion operator V and an initial set Sq of strategies, the 
set of strategy profiles that survive iterated deletion with respect to T> and So is 

(where 2)^(5) = V{S) and 1)^+^(5) = V{V^{S)). Similarly, the set of strategy pro- 
files for player i that survive iterated deletion with respect to T) and Sq is 'Df[So) = 
nfc>oPf (5o), where Vj = V, and V^+' = Vi o V^. 

^ Capra et al. actually considered a slightly different game where the minimum bid was p (rather 
than 2). If we instead consider this game, we get an even closer qualitative match to their experimental 
observations. 
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We can now define the deletion operator TZAi appropriate for regret minimization 
in strategic games (we deal witli Bayesian games in Section Hj). Intuitively, 7lAii{S) 
consists of all the strategies in Si that minimize regret, given that the other players are 
using a strategy in S-i. In more detail, we proceed as follows. Suppose that G is a 
strategic game {[n], A, u) and that 5 C A, the set of pure strategy profiles (i.e., actions). 
For a_j G let Mf'(a_j) = maxa^^^. Mj(aj, a_j). Thus, Mf*(a_j) is the best outcome for 
i given that the remaining players play a_j and that i can select actions only in S^. For 
Oj G Si and a_i G iS_j, let the regret of for player i given a_i relative to iSj, denoted 
regretf'{ai \ ci-i), be uf'{a_i) — Ui{ai,a^i). Let regretf{ai) = ma.x^_. (zs_^ regret'^^{ai \ 
a^i) denote the maximum regret of for player i (given that the other players' actions 
are chosen from S-i). Let minregretf = mma,es^ regretf~'{ai) be the minimum regret 
for player i relative to S. Finally, let 

TZAii{S) = {oj G Si : regretf{ai) = minregretf}. 

Thus, TlAii{S) consists of the set of actions that achieve the minimal regret with 
respect to S. Clearly nM^{S) C S. Let nM{S) = nMi{S) x ■ ■ ■ x nMn{S). 

If S consists of mixed strategies, then the construction of TZA4{S) is the same, 
except that the expected utility operator f/j is used rather than Ui in the definition of 
regret j^. We also need to argue that there is a strategy Sj for player i that maximizes 
regret ^ and one that minimizes minregretf. This follows from the compactness of the 
sets of which the max and min are taken, and the continuity of the functions being 
maximized and minimized. 

Definition 3.2 Let G = {[n],A, u) be a strategic game. TZM.°°{A) is the set of (pure) 
strategies for player i that survive iterated regret minimization with respect to pure 
strategies in G. Similarly, 7^A^°°(S(y4)) is the set of (mixed) strategies for player i that 
survive iterated regret minimization with respect to mixed strategies in G. 

The following theorem, whose proof is in the appendix, shows that iterated re- 
gret minimization is a reasonable concept in that, for all games G, TZAi'^{A) and 
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TZAi°° {Ti{A)) are nonempty fixed points of tlie deletion process, tliat is, TZAi{TZM.°° (A)) = 
nM°°{A) and nM{nM°^{J:{A))) = nM°^{J:{A)y, the deletion process converges at 
TZA4°°. (Our proof actually shows that for any nonempty closed set S of strategies, 
the set 71J^°°{S) is nonempty and is a fixed point of the deletion process.) 

Theorem 3.3 Let G = {[n],A,u) be a strategic game. If S is a closed, nonempty 
set of strategies of the form Si x . . . x Sn, then 1ZJ^°°{S) is nonempty, 71A4°^{S) = 
TZMTiS) X ... X UM^iS), and UMiUM'^iS)) = nM°^{S). 

Unlike standard solution concepts that involve equilibrium and, implicitly, knowl- 
edge of the other agents' strategies, in a strategy profile that survives iterated regret 
minimization, a player is not making a best response to the strategies used by the other 
players since, intuitively, he does not know what these strategies are. As a result, a 
player chooses a strategy that ensures that he does reasonably well compared to the 
best he could have done, no matter what the other players do. We shall see the impact 
of this in the examples of Section 13.41 

3.2 Comparison to Other Solution Concepts Involving Iter- 
ated Deletion 

Iterated deletion has been applied in other solution concepts. We mention three here. 
Given a set S of strategies, a strategy a E Si is weakly dominated hj r E Si with 
respect to S if, for some strategy (T_j G S-i, we have Ui{a, a^i) < Ui^r, a^i) and, for all 
strategies (?'_. G we have Ui{a, a'_^) < Ui{T, a'_^). Similarly, a is strongly dominated 
by r with respect to S if Ui{a,a'_j) < Ui{T,a'_^) for all strategies (t'_^ G S^i. Thus, if 
(J is weakly dominated by r with respect to then i always does at least as well 
with r as with cr, and sometimes does better (given that we restrict to strategies in 
iS_j); if cr is strongly dominated by r, then player i always does better with r as with 
a. Let 'WDi{S) (resp., ST>i{S)) consist of all strategies (jj G Si that are not weakly 
(resp., strongly) dominated by some strategy in Si with respect to S. We can then 
define the pure strategies that survive iterated weak (resp., strong) deletion with respect 
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to pure strategies as yVT>{A) (resp., SV{A)). And again, we can start with S to get 
corresponding notions for mixed strategies. 

As is well known [Osborne and Rubinstein 1994j . the rationalizable strategies can 
also be considered as the outcome of an iterated deletion process. Intuitively, a pure 
strategy for player i is rationalizable if it is a best response to some beliefs that player 
i may have about the pure strategies that other players are following. Given a set 
S of pure strategy profiles, a G iSj is justifiable if there is some distribution fi on 
the strategies in S-i such that a is a best response to the resulting mixed strategy. 
Intuitively, fi describes player i's beliefs about the likelihood that other players are 
following various strategies; thus, a strategy a for i is justifiable if there are beliefs that 
i could have to which a is a best response. Let J'i{S) consist of all strategies for player 
i that are justifiable with respect to S. A pure strategy a for player i is rationalizable 
if a G JriA)^ 

3.3 An Epistemic Characterization of Iterated Regret Mini- 
mization 

It is well known that rationalizability can be characterized in terms of common knowl- 
edge of rationality [Tan and Werlang 1988| , where a player is rational if he has some 
beliefs according to which what he does is a best response. Thus, it is common knowl- 
edge among the players that the rationalizable strategies are a best response to some 
beliefs whose support is the set of strategies that remain after iterated deletion. 

At first blush, it may seem that other notions of iterated deletion can also be 
characterized in terms of common knowledge. Intuitively, if it is common knowledge 
that all players are using the same deletion process, starting with a commonly known set 
of initial strategy profiles. Since they are all intelligent, they all delete once. Realizing 
that they have all deleted once, they delete again with respect to the smaller set of 
strategy profiles; realizing this, they delete again; and so on. 



^The notion of rationalizability is typically applied to pure strategies, although the definitions can 
be easily extended to deal with mixed strategies. 
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This intuition is essentially true in the case of iterated deletion of weakly dominated 
strategies, although making it precise is somewhat more subtle. The justification for 
deleting a weakly dominated strategy is the existence of other strategies. But this jus- 
tification may disappear in later deletions. As Mas-Colell, Whinston, and Green |1995[ 
p. 240] put in their textbook when discussing iterated deletion of weakly dominated 
strategies: 

[T]he argument for deletion of a weakly dominated strategy for player i 
is that he contemplates the possibility that every strategy combination of 
his rivals occurs with positive probability. However, this hypothesis clashes 
with the logic of iterated deletion, which assumes, precisely, that eliminated 
strategies are not expected to occur. 

Brandenburger, Friedenburg, and Kiesler |2004j resolve this paradox in the context 
of iterated deletion of weakly dominated strategies by assuming that strategies were 
not really eliminated. Rather, they assumed that strategies that are weakly domi- 
nated occur with infinitesimal (but nonzero) probability. This is formally modeled in 
a framework where uncertainty is captured using a lexicographic probability system 
(LPS) [Blume, Brandenburger, and Dekel lQQlJ , whose support consists of all types. 
(Recall that an LPS is a sequence (/io,/^i, . . .) of probability measures, in this case on 
type profiles, where /ii represents events that have infinitesimal probability relative to 
fiQ, fii represents events that have infinitesimal probability relative to fii, and so on. 
Thus, a probability of (1/2, 1/3, 1/4) can be identified with a nonstandard probability 
of l/2 + e/3 + e^/4, where e is an infinitesimal.) In this framework, they show that 
iterated deletion of weakly dominated strategies corresponds to what they call com- 
mon assumption of rationality, where "common assumption" is a variant of "common 
knowledge" , and "rationality" means "does not play a weakly dominated strategy" . 

With iterated regret minimization, there is a conceptual problem somewhat similar 
to that of iterated deletion of weakly dominated strategies. For simplicity, suppose 
that there are only two players, 1 and 2, and that the strategy profiles Si for player 1 
and ^2 for player 2 survive iterated deletion. Although the strategies in Si minimize 
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regret with respect to ^2 among the strategies in Si, it may well be the case that some 
of the deleted strategies might have even lower regret with respect to the strategies in 
S2- Indeed, this is the case with the Traveler's Dilemma. As we observed above, the 
strategy profile (97, 97) is the only one that survives iterated regret minimization when 
p = 2. However, if agent 1 knows that player 2 is playing 97, then he should play 96, 
not 97! That is, among all strategies, 97 is certainly not the strategy minimizes regret 
with respect to {97}. 

The approach taken by Brandenburger, Friedenberg, and Keisler [2004j does not 
seem to help in resolving this problem. Assigning deleted strategies infinitesimal prob- 
ability will not make 97 a best response to a set of strategies where 97 is given very high 
probability. We deal with this problem by essentially reversing the approach taken by 
Brandenburger, Friedenberg, and Keisler. Rather than assuming common knowledge 
of rationality, we assign successively lower probability to higher orders of rationality. 
Roughly speaking, the idea is that now, with overwhelming probability, no assumptions 
are made about the other players; with probability e, they are assumed to be rational, 
with probability e^, the other players are assumed to be rational and to believe that 
they are playing rational players, and so on. (Of course, "rationality" is interpreted 
here as minimizing expected regret.) This approach is consistent with the spirit of 
Camerer, Ho, and Chong's |20U4] cognitive hierarchy model, where the fraction of peo- 
ple with fcth-order beliefs declines as a function of k, although not as quickly as this 
informal discussion suggests. 

Since regret minimization is non-probabilistic, the formal model of a lexicographic 
belief is a sequence (5", 5^, . . .) of sets of strategy profiles. The strategy profiles in 

represent the players' primary beliefs, the strategy profiles in are the players' 
secondary beliefs, and so on. (We can think of 5^ as the support of the measure fik in 



an LPS.l 



We call iS* the level-i belief of the lexicographic belief {S^,S^, 



■^Like LPS's, this model implicitly assumes, among other things, that players i and j have the 
same beliefs about players j' ^ This assumption is acceptable, given that we assume that (it is 

commonly known that) all players start the iteration process by considering all strategies. To get an 
epistemic characterization of a more general setting, where players' initial beliefs about other players 
strategies are not commonly known, we need a slightly more general model of beliefs, where each 
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Given such lexicographic behefs, what strategy should a rational player i choose? 
Clearly the most important thing is to minimize regret with respect to his primary 
beliefs, S^^. But among strategies that minimize regret with respect to S^.^, the best 
are those strategies that also minimize regret with respect to 51^; similarly, among 
strategies that minimize regret with respect to each of . . . , S'^^^, the best are those 
that also minimize regret with respect to S^^. Formally, a strategy a for player i is 
rational with respect to a lexicographic sequence {S^,S^, . . .) if there exists a sequence 
(T°,T^, . . .) of strategy profiles such that consists of all strategies r such that 
minimizes regret with respect to S^^ for all players i; and for A; > is defined 
inductively to consist of all strategies r G T''^^ such that tv has the least regret with 
respect to S'l^ among all strategies in and a G fl^g'^^Jj Of course, this definition 

makes perfect sense if the lexicographic sequence is finite and has the form (5°, . . . , 5^); 
in that case we consider (T°, . . . , T^). Such a sequence (5°, . . . , S^) is called a {k + l)st- 
order lexicographic belief. It easily follows that ^ • ■ ■ C C 7"^-i ^ ■ ■ ■ C T*^, so 
that a strategy that is rational with respect to (5°, S^, . . .) is also rational with respect 
to each of the finite prefixes (5°, 5\ . . .). 

Up to now, we have not imposed any constraints on justifiability of beliefs. We 
provide a recursive definition of justifiability. A (fcth-order) lexicographic belief {Sj)j(zj 
is justifiable if, for each j G /, the level-j belief Sj is level-j justifiable, where level-j 
justifiability is defined as follows. 

• To capture the intuition that players' primary beliefs are such that they make no 
assumptions about the other players, we say that a belief Sf is level- justifiable 
if it is the full set of strategies Si available to player i^ 

• To capture the intuition that players' level-fc belief is that the other players are 
{k — l)st-order rational, we say that a belief, iSf , is level-k justifiable if there 



player has his or her own lexicographic sequence; see Section [3.51 

straightforward argument by induction shows that T*"' is nonempty and compact, so that there 
will be a strategy n that has the least regret with respect to among all strategies in T^^^^ . 

^As we discuss in Section [3.51 we can also consider a more general model where players have prior 
behefs; in such a setting, need not be the full set of strategies. 
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exists some justifiable fcth-order belief . . . ,S% ^) such that is the 

set of rational strategies for player i with respect to {S'^^, S'}^, . . . , S'_'^l['^). 

This notion of justifiability captures the intuition that, with probability e^, each player 
jk believes that each other player jk-i is rational with respect to a A;th-order belief and 
believes that, with probability e^~^, each other player jk_2 is rational with respect to a 
{k — l)st-order belief and believes that, with probability e^'"^, . . . , and with probability 
e believes that each other player ji is rational with respect to a first-order belief and 
believes that each other player jo is playing an arbitrary strategy in (As usual, 
"rationality" here means "minimizes regret with respect to his beliefs".) 
Given these definition, we have the following theorem. 

Theorem 3.4 Let G = ([n], A,^) be a strategic game and let S be the full set of pure 
or mixed strategies. Then for each k & N there exists a unique level-k justifiable belief 
S'^ = 71M.''^^{S). Furthermore, TZM.°°{S) is the set of rational strategies with respect 
to the belief (5°, 5^, . . .) and 71M.^{S) is the set of rational strategies with respect to 
the belief {S^,S\...,S^) 

Proof: By definition there is a unique level-0 justifiable belief = S. It inductively fol- 
lows that there exists a unique level-A; justifiable belief = TZMiS''-^) = TZM^-^{S). 
The theorem then follows from the definition of rationality with respect to a lexico- 
graphic belief. I 

Note that the sets 5^, . . .) are just the sets (T°, T^, . . .) given in the definition of 
rationality. 

In Appendix \^ we provide an alternative characterization of iterated regret mini- 
mization in terms of Kripke structures. 

3.4 Examples 

We now consider the outcome of iterated regret minimization in a number of standard 
games, showing how it compares to the strategies recommended by other solution con- 
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cepts. We start by considering what happens if we restrict to pure strategies, and then 
consider mixed strategies. 

3.4.1 Pure strategies 

Example 3.5 Traveler's Dilemma: If G = ([ra], A, -u) is the Traveler's Dilemma, then 
using the arguments sketched in the introduction, we get that 7lAi°°{A) = 7lAi^{A) = 
{100 — 2p + 1} if p > 50. As we mentioned, (2,2) is the only action profile (and 
also the only mixed strategy profile) that is rationalizable (resp., survives iterated 
deletion of weakly dominated strategies, is a Nash equilibrium). On the other hand, 
no actions or mixed strategies are strongly dominated, hence all actions (and mixed 
strategies) survive iterated deletion of strongly dominated strategies. Thus, iterated 
regret minimization is quite different from all these other approaches in the Traveler's 
Dilemma, and gives results that are in line with experimental observations. | 

Example 3.6 Centipede Game: Another well-known game for which traditional so- 
lution concepts provide an answer that is not consistent with empirical observations is 
the Centipede Came [Rosenthal 1982] . In the Centipede Game, two players play for a 
fixed number k of rounds (known at the outset). They move in turn; the first player 
moves in all odd-numbered rounds, while the second player moves in even-numbered 
rounds. At her move, a player can either stop the game, or continue playing (except at 
the very last step, when a player can only stop the game). For all t, player 1 prefers the 
stopping outcome in round 2t + 1 (when she moves) to the stopping outcome in round 
2t + 2; similarly, for all t, player 2 prefers the outcome in round 2t (when he moves) to 
the outcome in round 2t + 1. However, for all t, the outcome in round t + 2 is better 
for both players than the outcome in round t. 

Consider two versions of the Centipede Game. The first has exponential payoffs. 
In this case, the utility of stopping at odd-numbered rounds t is (2* + 1, 2* — 1), while 
the utility of stopping at even-numbered rounds is (2*~^, 2*). Thus, if player 1 stops at 
round 1, player 1 gets 3 and player 2 gets 2; if player 1 stops at round 4, then player 
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1 gets 8 and player 2 gets 16; if player 1 stops at round 20, the both players get over 
1,000,000. In the version with linear payoffs with punishment p > 1, if t is odd, the 
payoff is {t, t — p), while if t is even, the payoff is (t — p,t). 

The game can be described as a strategic game where Ai is the set of strategies 
for player i in the extensive-form game. It is straightforward to show (by backwards 
induction) that the only strategy profiles that survive iterated deletion are ones where 
player 1 stops at the first move and player 2 stops at the second move. These are also 
the only rationalizable strategy profiles and the only Nash equilibria. In contrast, in 
empirical test (which have been done with linear payoffs), subjects usually cooperate 
for a certain number of rounds, although it is rare for them to cooperate throughout 
the whole game [McKelvey and Palfrey 1992 Nagel and Tang 1998| . As we now show, 
with iterated regret minimization, we also get cooperation for a number of rounds 
(which depends on the penalty); with exponential payoffs, we get cooperation up to 
the end of the game. Our results suggest some further experimental work, with regard 
to the sensitivity of the game to the payoffs. 

Before going on, note that, technically, a strategy in the Centipede Game must 
specify what a player does whenever he is called upon to move, including cases where 
he is called upon to move after he has already stopped the game. Thus, if t is odd 
and t + 2 < k, there is more than one strategy where player 1 stops at round t. For 
example, there is one where player 1 also stops at round t + 2, and another where he 
continues at round t + 2. However, all the strategies where player 1 first stops at round 
t are payoff equivalent for player 1 (and, in particular, are equivalent with respect to 
regret minimization). We use [t] denote the set of strategies where player 1 stops at 
round t, and similarly for player 2. It is easy to see that in the fc-round Centipede 
Game with exponential payoffs, the unique strategy that minimizes regret is to stop at 
the last possible round. On the other hand, with linear payoffs and punishment p, the 
situation is somewhat similar to the Traveler's Dilemma. All strategies (actions) for 
player 1 that stop at or after stage k — p + 1 have regret p — 1, which is minimal, but 
what happens with iterated deletion depends on whether k and p are even or odd. For 
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example, if k and p are both even, then 7lM.i{A) = {[k — p + 1], [k — p + 3], . . . , k — 1} 
and 71M2{A) = {k - p + 2, k - p + 4, . . . , k}. Relative to 71M{A), the strategies 
in [k — p + 1] have regret p — 2; the remaining strategies have regret p — 1. Thus, 
UMliA) = UM^iA) = {[k-p+1]}. Similarly, TZMKA) = UM^iA) = {[k-p+2]}. 

If, on the other hand, both k and p are odd, TZA4i{A) = {[k — p + 1], [k — p + 
3], . . . , [k]} and 71A42{A) = {[k—p], [k—p+2], . . . , [A; — 1]}. But, here, iteration does not 
remove any strategies (as here k — p+1 still has regret p—1 for player 1, and k — p still 
has regret p - 1 for player 2. Thus, TZMTiA) = nMi{A) and nM^{A) = nM2{A). 
I 



Example 3.7 Matching pennies: Suppose that A\ = A^ = {a, and u(a,a) = 
6) = (80, 40), M(a, h) = u{b, a) = (40, 80), and consider the matching pennies game, 
with payoffs as given in the table below: 





a 


b 


a 


(80,40) 


(40,80) 


b 


(40,80) 


(80,40) 



Since the players have opposite interest there are no pure strategy Nash equilibria. 
Randomizing with equal probability over both actions is the only Nash equilibria; this 
is consistent with experimental results (see e.g., |Goeree and Holt 2001] I With regret 
minimization, both actions have identical regret for both players, thus using regret 
minimization (with respect to pure strategies) both actions are viable. 



Consider a variant of this game (called the asymmetric matching pennies Goeree, Holt, and Palfrey 
where u{a,a) = (320,40). Here, the unique Nash equilibrium is one where player 1 





a 


b 


a 


(320,40) 


(40,80) 


b 


(40,80) 


(80, 40) 



still randomizes between a and b with equal probability, but player 2 picks b with proba- 
bility 0.875 Goeree, Holt, and Palfrey 2000| . Experimental results by Goeree and Holt 
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|2001j show quite different results: player 1 chooses a with probability 0.96; player 2, on 
the other hand, is consistent with the Nash equilibrium and chooses h with probability 
0.86. In other words, players most of the time end up with the outcome {a,b). With 
iterated regret minimization, we get a qualitatively similar result. It is easy to see that 
in the first round of deletion, a minimizes the regret for player 1, whereas both a and 
b minimize the regret for player 2; thus TZAil{A) = a and TZAil^A) = a,b. In the 
second round of the iteration, b is the only action that minimize regret for player 2. 
Thus, TlAi'^{A) = TIA4°° = (a, 6); i.e., (a, 6) is the only strategy profile that survives 
iterated deletion. | 

Example 3.8 Coordination games: Suppose that A\ = A2 = {0',&}, and u[a^d) = 
{k, k),u{b, b) = (1, l),u{a, b) = u{b, a) = 0, as shown in the table below: 

a b 



ik,k) (0,0) 
(0,0) (1,1) 



Both (a, a) and {b,b) are Nash equilibria, but (a, a) Pareto dominates {b,b) if A; > 1: 
both players are better off with the equilibrium (a, a) than with {b,b). With regret 
minimization, we do not have to appeal to Pareto dominance if we stick to pure strate- 
gies. It is easy to see that if A; > 1, TZMKA) = TZMKA) = 7^A^^(A) = TZM'^iA) = 
{a} (yielding regret 1), while and if A; = 1, TZMHA) = TZMKA) = TZM'^iA) = 
TZMTiA) = {a,b}. I 

Example 3.9 Bertrand competition: Bertrand competition is a 2-player game where 
the players can be viewed as firms producing a homogeneous good. There is demand 
for 100 units of the good at any price up to $200. If both firms charge the same price, 
each will sell 50 units of the good. Otherwise, the firm that charges the lower price will 
sell 100 units at that price. Each firm has a cost of production of 0, as long as it sells 
for a positive price, it makes a profit. It is easy to see that the only Nash equilibria of 
this game are (0,0) and (1,1). But it does not seem so reasonable that firms playing 
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this game only once will charge $1, when they could charge up to $200. And indeed, 
experimental evidence shows that people in general choose significantly higher prices 
[Dufwenberg and Gneezy 2000| . 

Now consider regret. Suppose that firm 1 charges n > 1. If firm 2 charges m > 1, 
then the best response is for firm 1 to charge m — 1. U m > n, then firm I's regret is 
(m — 1 — ?7.)100; if m = n > 1, firm I's regret is {n/2 — 1)100; if m = n = 1, firm I's 
regret is 0; and if m < n, firm I's regret is (m — 1)100. If m = 1, firm I's best response 
is to charge 1, and the regret is if = 1 and 100 if > 1. It follows that firm I's 
regret is max((199 — n)100, (n — 2)100). Clearly if = 0, firm I's regret is 199 x 100 (if 
firm 2 charges 200). Thus, firm 1 minimizes regret by playing 100 or 101, and similarly 
for firm 2. A second round of regret minimization, with respect to {100, 101}, leads to 
100 as the unique strategy that results from iterated regret minimization. This seems 
far closer to what is done in practice in many cases. I 

Example 3.10 The Nash bargaining game INash 1950^ : In this 2-player game, each 
player must choose an integer between and 100. If player 1 chooses x and player 2 
chooses y and x + y < 100, then player 1 receives x and player 2 receives y\ other- 
wise, both players receive 0. All strategy profiles of the form (x, 100 — x) are Nash 
equilibria. The problem is deciding which of these equilibria should be played. Nash 
[1950] suggested a number of axioms, for which it followed that (50, 50) was the unique 
acceptable strategy profile. 

Using iterated regret minimization leads to the same result, without requiring 
additional axioms. Using arguments much like those used in the case of Bertrand 
competition, it is easy to see that the regret of playing x is max(100 — x, x — 1). If the 
other player plays y < 100 — x, then the best response is 100 — and the regret is 
100 — y — X. Clearly, the greatest regret is 100 — x, when y = 0. On the other hand, 
if the other player plays y > 100 — x, then the best response is 100 — y, so the regret 
is 100 — y. The greatest regret comes if y = 100 — x + 1, in which case the regret is 
X — 1. It follows that regret is minimized by playing either 50 or 51. Iterating regret 
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minimization with respect to {50, 51} leaves us with 50. Thus, (50, 50) is the only 
strategy profile that survives iterated regret minimization. 

We have implicitly assumed here that the utility of a payoff of x is just x. If, more 
generally, it is u{x), and u is an increasing function, then the same argument shows 
that the regret is max(M(100) — u{x),u{x — 1) — m(0)). Again, there will be either one 
value for which the regret is maximized (as there would have been above if we had 
taken the total to be 99 instead of 100) or two consecutive values. A second round 
of regret minimization will lead to a single value; that is, again there will be a single 
strategy profile of the form (x, x) that survives iterated regret minimization. However, 
X may be such that 2x < 100 or 2x > 100. This can be viewed as a consequence of 
the fact that, in a strategy profile that survives iterated regret minimization, a player 
is not making a best response to what the other players are doing. | 

We next show that that iterated regret and Nash equilibrium agree on Prisoner's 
Dilemma. This follows from a more general observation, that iterated regret always 
recommends a dominant action. A dominant action a for player i is one such that 
Ui{a,b_i) > Ui{a',b_i) for all a' e Ai and b & A. We can similarly define a dominant 
(mixed) strategy. It is easy to see that dominant actions survive iterated deletion of 
weakly and of strongly dominated actions with respect to A, and are rationalizable. 
Indeed, if there is a dominant action, the only actions that survive one round of iterated 
deletion of weakly dominated strategics arc dominant actions. Similar observations hold 
in the case of mixed strategies. The next result shows that iterated regret minimization 
acts like iterated deletion of weakly dominated strategies in the presence of dominant 
actions and strategies. 

Proposition 3.11 Let G — ([n], A, u) he a strategic game. If player i has a dominant 
action a^, then 

(a) nMi{A)^nMT{A); 

(b) TZM.i{A) consists of the dominant actions in G. 
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Proof: Every action that is dominant has regret (which is minimal); furthermore, 
only dominant actions have regret 0. It follows that TZM.i{A) consists of the dominant 
actions for i (if such actions exist). Since none of these actions will be removed in later 
deletions, it follows nMi{A) = 7^A<°°(^)• ■ 

Example 3.12 (Repeated) Prisoner's Dilemma: Recall that in Prisoner's Dilemma, 
players can either cooperate (c) or defect (d). They payoffs are u{d,d) = {ui,Ui)), 
u{c,c) = {u2,U2), u{d,c) = (^3,0), u{c,d) = (0,^x3), where < -ui < ^2 < M3 and U2 > 
U3/2 (so that alternating between (c, d) and {d, c) is not as good as always cooperating). 
It is well known (and easy to check) that d is the only dominant action for both 1 and 
2, so it follows by Proposition 13.111 that traditional solutions concepts coincide with 
iterated regret minimization for this game. 

Things get more interesting if we consider repeated Prisoner's Dilemma. We show 
that if both players use iterated regret deletion, they will defect in every round, both 
in finitely and infinitely repeated Prisoner's Dilemma. 

First consider Prisoner's Dilemma repeated n times. Let Sad, the strategy where 
player 1 always defects, and let 5* consist of all pure strategies in n-round Prisoner's 
Dilemma. 

Lemma 3.13 regretf{sad) = {n — 1)('U3 — U2) + max(— 'Ui,'U2 — ^3). Moreover, if s is 
a strategy for player 1 where he plays c before seeing player 2 play c (i.e., where player 
1 either starts out playing c or plays c at the kth for k > 1 move after seeing player 2 
play d for the first k — 1 moves), then regret f{s) > (n — 1)(m3 — M2) +max(— ui, U2 — ^3). 

It follows from Lemma 13.131 that the only strategies that remain after one round of 
deletion are strategies that start out defecting, and continue to defect as long as the 
other player defects. If the players both play such a strategy, they both defect at every 
round. Thus, all these strategies that survive one round of deletion survive iterated 
deletion. It follows that with iterated regret minimization, we observe defection in 
every round of finitely repeated prisoners dilemma. Essentially the same argument 
shows that this is true in infinitely repeated Prisoner's Dilemma (where payoffs are 
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discounted by 5, for < 5 < 1). By way of contrast, while always defecting is the only 
Nash equilibrium in finitely repeated Prisoner's Dilemma, the Folk Theorem shows that 
for all p with < p < 1, if 5 is sufficiently close to 1, there is an equilibrium in which p is 
the fraction of times that both players cooperate. Thus, with Nash equilibrium, there 
is a discontinuity between the behavior in finitely and infinitely repeated Prisoner's 
Dilemma that does not occur with regret minimization. Nevertheless, intuition suggests 
that there should be a way to justify cooperation using regret minimization, just as in 
the case of the Centipede Game. This is indeed the case, as we show in Section [3751 I 

Example 3.14 i/at(;A;-Z)of e]^ In this game, = = {d^}i\\ a player can choose 
to be a dove (rf) or a hawk [K). The payoffs are something like Prisoner's Dilemma 
(with h playing the role of "defect"), but the roles of a and are switched. Thus, 
we have u{d,d) = {b,b),u{d, h) = {a,c),u{h,d) = {c,a),u{h,h) = (0,0), where again 
< a < b < c. This switch of the role of a and results in the game having two Nash 
equilibria: {h,d) and {d,h). But d is the only action that minimizes regret (yielding 
regret c-b). Thus, nM{A) = 7^A4°°(A) = {d, d). I 

Example 3.15 In all the examples considered thus far, there were no strongly dom- 
inated strategies, so all strategies survived iterated deletion of strongly dominated 
strategies. The game described below shows that the strategies that survive iterated 
deletion of strongly dominated strategies can be disjoint from those that survive iter- 
ated regret minimization. 





X y 


a 


(0,100) (0,0) 


b 


(1,0) (1,1) 



First consider iterated deletion of strongly dominated strategies. For player 1, b 
strongly dominates a. Once a is deleted, y strongly dominates x for player 2. Thus, 
iterated deletion of strictly dominated strategies leads to the unique strategy profile 
^This game is sometimes known as Chicken. 
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{b,y). Now consider regret minimization. The regret of x is less than that of y, while 
the regret of b is less than that of a. Thus, iterated regret minimization leads to {b, x) . 
I 

Note that in the examples above, the deletion process converges after two steps. We 
can construct examples of games where we need max(|A|i — — 1) deletion 

steps. The following example shows this in the case that n = 2 and l^il = 1^421, and 
arguably illustrates some problems with iterated regret minimization. 

Example 3.16 Consider a symmetric 2-player game with Ai = A2 = {oi, . . . , a„}. If 
both players play a^, then the payoff for each one is k. For k > 1, if one player plays 
Qk and the other plays Uk-i, then the player who plays get —2, and the other gets 
0. In all other cases, both players get payoff of 0. The ij entry of the following matrix 
describes player /i's payoff if h plays Oj and player 2 — h plays aj: 

1 ... 
-2 2 ... 
-2 3 ... 

... -2 n 

Note that a„ has regret n + 1 (if, for example player 1 plays a„ and player 2 plays 
a„_i, player 1 could have gotten n + 1 more by playing a„_i; all other actions have 
regret n. Thus, only a„ is eliminated in the first round. Similar considerations show 
that we eliminate a„_i in the next round, then a„_2, and so on. Thus, Oi is the only 
pure strategy that survives iterated regret minimization. 

Note that {uk, Ofe) is a Nash equilibrium for all k. Thus, the strategy that survives 
iterated regret minimization is the one that is Pareto dominated by all other Nash 
equilibria. We get a similar result if we modify the payoffs so that if both players play 
ak, then they both get 2'^, while if one player plays and the other plays flfc-ii then 
the one that plays get —2 — 2*^~^, and the other gets 0. Suppose that n = 20 in the 
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latter game. Would players really accept a payoff of 2 when they could get a payoff of 
over 1,000,000 if they could coordinate on 020? Perhaps they would not play 020 if they 
were concerned about the loss they would face if the other player played aig. 

The following variant of a generalized coordination game demonstrates the same 
effect even without iteration. 





a b 


a 


(1,1) (0,-10) 


b 


(-10,0) (10,10) 



Clearly {b,b) is the Pareto optimal Nash equilibrium, but playing b has regret 11, 
whereas a has regret only 10; thus (a, a) is the only profile that minimizes regret. 
Note, however, that (a, a) is the risk dominant Nash equilibrium. (Recall that in a 
generalized coordination game — a 2- player, 2-action game where Ui{a,a) > Ui{b,a), 
Ui{b,b) > ui{a,b), U2{a,a) > ^2(0, 6), and ^2(6, &) > U2{b,a) — the Nash equilibrium 
(a, a) is risk dominant if the product of the "deviation losses" for {b, b) (i.e., {ui{b, a) — 
Ui{b, b)){u2{a, 6) — ^2(6, b))) is higher than the product of the deviation losses for (a, a).) 
In the game above, the product of the deviation losses for (6, b) is 100 = (0 — 10) (0 — 10), 
while the product of the deviation losses for (a, a) is 121 = (—10 — 1)(— 10 — 1); thus, 
(a, a) is risk dominant. In fact, in every generalized coordination game, the product of 
the deviation losses for {x,x) is regreti{x)regret2{x) , so if the game is symmetric (i.e., 
if Ui{x,y) = U2{y,x), which implies that regreti{x) = regret2{x)), the risk dominant 
Nash equilibrium is the only action profile that minimizes regret. (It is easy to see that 
this is no longer the case if the game is asymmetric.) | 

3.4.2 Mixed strategies 

Applying regret in the presence of mixed strategies can lead to quite different results 
than if we consider only pure strategies. We first generalize Proposition 13. IH to show 
that if there is a dominant strategy that happens to be pure (as is the case, for example, 
in Prisoner's Dilemma), nothing changes in the presence of mixed strategies. But in 
general, things can change significantly. 
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Proposition 3.17 Let G = {[n],A,u) be a strategic game. If player i has a dominant 
action ai, then TZMii^) = TZMfiJ^) = A{nMi{A)). 

Proof: The argument is similar to tlie proof of Proposition 13 . 1 II and is left to the reader. 
I 

To understand what happens in general, we first shows that we need to consider 
regret relative to only pure strategies when minimizing regret at the first step. (The 
proof is relegated to the Appendix.) 

Proposition 3.18 Let G = {[n],A, u) he a strategic game and let be a mixed strategy 
for player i. Then regretf{ai) = max^ -^A . regret^^{ai \ a-i). 

Example 3.19 Roshambo (Rock-Paper-Scissors): In the rock-paper-scissors game, 
Ai = A2 = {r, rock (r) beats scissors (s), scissors beats paper (p), and paper 
beats rock; u{a,b) is (2,0) if a beats b, (0,2) if b beats a, and (1,1) if a = 6. If we 
stick with pure strategies, by symmetry, we have 7lA4i{A) = 7^A^2(^) = TZM.'^iA) = 
TZM.^{A) = {r,s,p}. If we move to mixed strategies, it follows by Proposition 13.181 
that picking r, s, and p each with probability 1/3 is the only strategy that minimizes 
regret. (This is also the only Nash equilibrium.) | 

Example 3.20 Matching pennies with mixed strategies: Consider again the matching 
pennies game, where Ai = A2 = {a,b}, and u{a,a) = u{b,b) = (80, 40), M(a, 6) = 
u{b,a) = (40,80). Recall that nM^{A) = TZM^iA) = {(a, 6)}. Consider a mixed 
strategy that puts weight p on a. By Proposition 13.181 the regret of this strategy 
is max(40(l — p),AOp), which is minimized when p = | (yielding regret 20). Thus 
randomizing with equal probability over a, b is the only strategy that minimizes regret; 
it is also the only Nash equilibrium. But now consider the asymmetric matching pennies 
game, where u{a, a) = (320,40). Recall that 7^A^^"■^*^(v4) = (a, 6). Since the utilities 
have not changed for player 2, it is still the case that l/2a + 1/26 is the only strategy 
that minimizes regret for player 2. On the other hand, if by Proposition l3.18[ the regret 
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of the strategy for player 1 that puts weight p on a is max(280(l — p),AOp), which is 
minimized when p = .875 Thus 7^A^*"^*^(S) = (.875a + .2256, 0.5a + 0.56). I 

Example 3.21 Coordination games with mixed strategies: Consider again the coor- 
dination game where A\ = Ai = {a, 6} and ^(a, a) = {k, k),u{b,b) = (1,1), u{a,b) = 
u{b, a) = 0. Recall that if A; > 1, then TZM^iA) = TZM'^iA) = {(a)}, while if A; = 1, 
then TZAi'^{A) = TZAi'^{A) = {a,b}. Things change if we consider mixed strategies. 
Consider a mixed strategy that puts weight p on b. By Proposition 13.181 the regret of 
this strategy is max(fcp, 1 — p) which is minimized when p = (yielding regret ^;^). 
Thus, mixed strategies that minimize regret can put positive weight on actions that 
have sub-optimal regret. | 

Example 3.22 Traveler's Dilemma with mixed strategies: As we saw earlier, each of 
the choices 96-100 has regret 3 relative to other pure strategies. It turns out that there 
are mixed strategies that have regret less than 3. Consider the mixed strategy that puts 
probability 1/2 on 100, 1/4 on 99, and decreases exponentially, putting probability 1/2^^ 
on both 3 and 2. Call this mixed strategy a. Let S consist of all the mixed strategies 
for Traveler's Dilemma. 

Lemma 3.23 regretf{a) < 3. 

The proof of Lemma 13.231 shows that regretf{a) is not much less than 3; it is 
roughly 3 x (1 — 1/2^°^"'^). Nor is a the strategy that minimizes regret. For example, 
it follows from the proof of Lemma [3.231 that we can do better by using a strategy that 
puts probability 1/2 on 98 and decreases exponentially from there. While we have not 
computed the exact strategy that minimizes (or strategies that minimize) regret — the 
computation is nontrivial and does not add much insight — we can make two useful 
observations: 

• The mixed strategy that minimizes regret places probability at most 3/(99 — k) 
on the pure strategies k or less. For suppose player places probability a on the 
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pure strategies k or less. If player 2 plays 100, player 1 could have gotten 101 
by playing 99, and gets at most + 2 by playing k or less. Thus, the regret is 
at least (99 — k)a, which is at least 3 if a > 3/(99 — k). Thus, for example, the 
probability that 90 or less is played is at most 1/3. 

• The strategy that minimizes regret has regret greater than 2.9. To see this, note 
that the strategy can put probability at most 3/97 on 2 and at most 3/96 on 3. 
This means that the regret relative to 3 is at least 

(1 - 3/96 - 3/97)3 + 3/96 = 3 - 6/96 - 3/97 > 2.9). 

The fact that it is hard to compute the exact strategy the minimizes regret suggests 
that people are unlikely to be using it. On the other hand, it is easy to compute that 
the optimal strategy puts high weight on actions in the high 90 's. In retrospect, it 
is also not surprising that one can come close to minimizing regret by putting some 
weight on (almost) all actions. This was also the case in Example I3.21t as we observed 
there, we can sometimes do better by putting some weight even on actions that do 
not minimize regret |l| Interestingly, the distribution of strategies observed by Becker, 
Carter, and Naeve [2005j is qualitatively similar to the distribution induced by a mixed 
strategy that is close to optimal. If everyone in the population was playing a mixed 
strategy that was close to optimal in terms of minimizing regret, we would expect to 
see something close to the observed distribution. | 

The phenomena observed in the previous two examples apply to all the other exam- 
ples considered in Section 13.4. 1[ For example, in the Bertrand competition, while the 
pure strategies of least regret (100 and 101) have regret 9,900, there are mixed strate- 
gies with regret less than 7,900 (e.g., by putting probability 1/5 on each of 80, 100, 

^It is not necessarily the case that the support of the optimal strategy consists of all actions, or 
even all undominated actions. For example, consider a coordination game with three actions a, 6, and 
c, where u{a,a) — u{b,b) = k, u[c,c) = 1, and u{x,y) — ii x ^ y. If fc > 2, then the strategy that 
minimizes regret places probability 1/2 on each of a and &, and probability on c. 
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120, 140, and 160). We can do somewhat better than this, but not much. Moreover, 
we beheve that in both Traveler's Dilemma and in Bertrand Competition there is a 
unique mixed strategy that minimizes regret, so that one round of deletion will suffice. 
This is not true in general, as the follow example shows. 

Example 3.24 Consider a 2-player symmetric game where A\ = A2 = {dmk '■ ^ = 
1, . . . , n. A; = 1, 2}. Define 



ui{aij, ttki) = \ 



if i = k, j = I 

-3^+1 iii = k,j^l 

Let S consists of all mixed strategies in this game. We claim that, for every strategy a 
for player 1, regretf{cr) > 3", and similarly for player 2. To see this, consider a mixed 
strategy of the form J2ijPij(^ij- The best response to anj is a„j, which gives a payoff of 
0. Thus, the regret of this strategy relative to a„i is S"'{J2ij,i=inPij + 3pn2)- Similarly, 
the regret relative to 0^3 is S'^iJ^ij^i^nPij + 3pni)- Thus, the sum of the regrets relative 
to Qni and a„2 is 3" (2 +Pni +Pn2)- It follows that the regret relative to one of a„i and 
a„2 is at least 3". It also easily follows that every convex combination of strategies aij 
with i < n has regret exactly 3" (the regret relative to a„i is 3", and the regret relative 
to every other strategy is no worse). Moreover, every strategy that puts positive weight 
on a„i or a„2 has regret greater than 3". Thus, at the ffist step, we eliminate all and 
only strategies that put positive weight on a„i or a„2- An easy induction shows that at 
the kth step we eliminate all and only strategies that put positive weight on a(n-fc+i)i 
or Afte. n - 1 steps ot .te.ated delation, the only st.ateg.es that ate eonve. 

combinations of an and ai2. One more step of deletion leaves us l/2aii + l/2ai2jj 



®In this example, at every step but the last step, the set of strategies that remain consist of all 
convex combinations of a subset of pure strategies. But this is not necessarily the case. If we replace 
3*^ by 2*^ in all the utilities above, then we do not eliminate all strategies that put positive weight on 
a„i or a„2; in particular, we do not eliminate strategies that put the same weight on a„i and a„2 (i.e., 
where Pni = Pn2)- 
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In the case of pure strategies, it is immediate that there cannot be more than 
|^i| + ■ ■ ■ + |^n| rounds of deletion, although we do not have an example that requires 
more than max(|y4i|, . . . , l^^l) rounds. Example 13.241 shows that maxd^il, . . . , l^^l) 
may be required with mixed strategies, but all that follows from Theorem 13.31 is that 
the deletion process converges after at most countably many steps. We conjecture that, 
in fact, the process converges after at most max(|y4i|, . . . , steps, both with pure 
and mixed strategies, but we have not proved this. 

3.5 Iterated regret minimization with prior beliefs 

We have assumed that we start the deletion process with all pure (resp., mixed) strategy 
profiles. Moreover, we have assumed that, at all stages in the deletion process (and, 
in particular, at the beginning), the set of strategies that the agents consider possible 
is the same for all agents. More generally, we could allow each agent i could start a 
stage in the deletion process with a set of strategy profiles. Intuitively, the strategies 
in E* are the the only strategies that i is considering for j. For j ^ i, it is perhaps 
most natural to think of as representing z's beliefs about what strategies j will use; 
however, it may also make sense to interpret S*- as a representative set of j's strategies 
from i's point of view, or as the only ones that i is considering but it is too complicated 
to consider them all (see below). For i = j, is the set of strategies that i is still 
considering using; thus, i essentially ignores all strategies other than those in S^. When 
we do regret minimization with respect to a single set S of strategy profiles (as we do 
in the definition of iterated regret minimization), we are implicitly assuming that the 
players have common beliefs. 

The changes required to deal with this generalization are straightforward: each 
agent simply applies the standard regret minimization operator to his set of strat- 
egy profiles. More formally, the generalized regret minimization TZAi' takes as an 
argument a tuple (Hi, . . . , n„) of strategy profiles and returns such a tuple; we define 

7^A<(^l,...,^„) = (7^^^ (Hi ),..., 7^^<(^0)EI 



^As we hinted in Section [3.31 an epistemic justification of this more general notion of regret mini- 



28 



Example 3.25 Repeated Prisoner's Dilemma with prior beliefs: The role of prior 
beliefs is particularly well illustrated in Repeated Prisoner's Dilemma. In the proof of 
Lemma fS.lSt to show that the regret of a strategy like Tit for Tat is greater {n — l){u^ — 
U2) + max(— ui,U2 — M3), it is necessary to consider a strategy where player 2 starts 
out by defecting, and then cooperates as long as player 1 defects. This seems like an 
extremely unreasonable strategy for player 2 to use! Given that there are 2^""^ pure 
strategies for each player in n-round Prisoner's Dilemma, and computing the regret of 
each one can be rather complicated, it is reasonable for the players to focus on a much 
more limited set of strategies. Suppose that each player believes that the other player 
is using a strategy where plays Tit for Tat for some number k of rounds, and then 
defects from then on, for some k. Call this strategy Sk- (So, in particular, sq = Sad and 
Sn is Tit for Tat.) Let S* consist of all the strategies for player i; let be any set 
of strategies for player 2 — i that includes 82^^. It is easy to see that the best response 
to So is So, and the best response to Sk for A; > 1 is Sk-i (i.e., you are best off defecting 
just before the other player starts to defect). Thus, 



regret^ 



[Sk Si, 



{I - k - l){u2 - ui) iik<l 

M3 + Ml — 2m2 if = / > 

M3 + Ml — M2 if > / > 

Ml if A; > / = 

if A; = / = 0. 



It follows that 



regret^ 



max((n — k — 1)(m2 — Mi), M3 + Mi — M2) if A; > 2 
max((ri — 1)(m2 — Mi), M3 + Mi — 2m2, m1) if A; = 1 
n(M2 — Ml) if A; = 



Intuitively, if player 1 plays Sk and player 2 is playing a strategy in 6*2, then player I's 



mization would require a more general notion of lexicographic beliefs, where each player has a separate 
sequence of beliefs. 
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regret is maximized if player 2 plays either s„ (in which case 1 would have been better 
off by continuing to cooperate longer) or if player 2 plays Sk-i (assuming that A; > 1), 
in which case 1 would have been better off by defecting earlier. Thus, the strategy that 
minimizes player I's regret is either si, or sq- (This is true whatever strategies 

player 1 is considering for himself, as long as it includes these strategies.) If n is 
sufficiently large, then it will be Sn-i- This seems intuitively reasonable. In the long 
run, a long stretch of cooperation pays off, and minimizes regret. Moreover, it is not 
hard to show that allowing mixtures over sq, ■ ■ ■ ,Sn makes no difference; for large n, 
Sn-i is still the unique strategy that minimizes regret. 

To summarize, if each player i believes that the other player 2 — i is playing a 
strategy in ^g.j — a reasonable set of strategies to consider — then we get a strategy 
that looks much more like what people do in practice. | 

Thinking in terms of beliefs makes it easy to relate iterated regret to other notions 
of equilibrium. Suppose that there exists a strategy profile a such that player i's beliefs 
have the form Ej x {cr_j}. That is, player i believes that each of the other players are 
playing their component of a, and there are no constraints on his choice of strategy. 
Then it is easy to see that the strategies that minimize player i's regret with respect to 
these beliefs are just the best responses to In particular, if it is a Nash equilibrium, 
then (ct, . . . , (t) G TZM.'{T,i x (7_i , . . . , E„ x (7_„) . The key point here is that if the agent's 
beliefs are represented by a "small" set, then the agent makes a best response in the 
standard sense by minimizing regret; minimizing regret with respect to a "large" belief 
set looks more like traditional regret minimization. 

4 Iterated Regret Minimization in Bayesian Games 

Bayesian games are a well-known generalization of strategic games, where each agent 
is assumed to have a characteristic or some private information not known to the other 
players. This is modeled by assuming each player has a type. Typically it is assumed 
that that there is a commonly known probability over the set of possible type profiles. 
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Thus, a Bayesian game is tuple {[n], A, u, T, tt), where, as before, [n] is the set of players, 
A is the set of action profiles, u is the profile of utility functions, T = Ti x ... x Tn is 
the set of type profiles (where Tj represents the set of possible types for player i), and 
TT is a probability measure on T. A player's utility can depend, not just on the action 
profile, but on the type profile. Thus, ui : A x T ^ IR. For simplicity, we assume 
that Pr(tj) > for all types tj G Tj and i = 1, . . . ,n (where is an abbreviation of 
{i^:t', = U}). 

A strategy for player i in a Bayesian game in a function from player i's type to an 
action in Af, that is, what a player does will in general depends on his type. For a pure 
strategy profile a, player i's expected utility is 

Uii^) = ^TT{i)ui{ai{ti), . . . ,(T„(t„)). 
teT 

Player i's expected utility with a mixed strategy profile a is computed by computing 
the expectation with respect to the probability on pure strategy profiles induced by a. 
Given these definitions, a Nash equilibrium in a Bayesian game is defined in the same 
way as a Nash equilibrium in a strategic game. 

There are some subtleties involved in doing iterated deletion in Bayesian games. 
Roughly speaking, we need to relativize all the previous definitions so that they take 
types into account. For ease of exposition, we give the definitions for pure strategies; 
the modifications to deal with mixed strategies are straightforward and left to the 
reader. 

As before, suppose that S — Si x . . . x »S„. Moreover, suppose that, for each 
player i, Si is also a crossproduct, in that, for each type t & Ti, there exists a set of 
actions A{t) e Ai such that Si consists of all strategies a such that a{t) e A(t) for 
all t e Tj. For a_j e iS_j and type vector t, let uf'-{a_i,i) = max^^gs Mi(aj, a_j, t). For 
ai e Si, S-i G iS_j, and type profile t, let the regret of for player i given a_i and 
type profile t relative to Si, denoted regretf'{ai \ a-i,i), be uf'-{a-i,t) — Ui{ai,a-i,t). 
Let regretf{ai \ = T^^S-ieS-i{t-i) regret^'' (ai \ a^i,f) denote the the maximum regret 
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of player i given t. The expected regret of ai given ti and S^i is E[regreti\ai \ ti)] = 
X;teTPr(i I ti)regretf'{ai \ i). Let minregret^' (ti) = mmaieStiu) E[regretf {ai \ ti). We 
delete all those strategies that do not give an action that minimizes expected regret for 
each type. Thus, let TZM.i{Si) — {a e Si : regretf^{a{ti) \ ti) — minregret^^{ti)}. As 
usual, we take 'R,M{S) = TlMi{Si) x . . . x TZMni^n)- Having defined the deletion 
operator, we can apply iterated deletion as before. 

Example 4.1 Second-Price Auction: A second-price auction can be modeled as a 
Bayesian game, where a player's type is his valuation of the product being auctioned. 
His possible actions are bids. The player with the highest bid wins the auction, but 
pays only what the second-highest (For simplicity, we assume that in the event of a tie, 
the lower-numbered player wins the auction.) If he bids b and has valuation (type) v, 
his utility is b — v; if he docs not win the auction, his utility is 0. As is well known, in a 
second-price auction, the strategy where each player bids his type is weakly dominant; 
hence, this strategy survives iterated regret minimization. No other strategy can give 
a higher payoff, no matter what the type profile is. I 

Example 4.2 First-Price Auction: In a first-price auction, the player with the highest 
bid wins the auction, but pays his actual bid. Assume, for simplicity, that bids are 
natural numbers, that the lower-numbered player wins the auction in the event of a 
tie, that all valuations are even, and that the product is sold only if some player bids 
above 0. If a player's valuation is v, then bidding v' has regret max{v' — l,v — v' — 1). 
To see this, consider player i. Suppose that the highest bid of the other agents is v", 
and that the highest-numbered agent that bids v" is agent j. If v" < v' or v" = v' and 
i < j, then i wins the bid. He may have done better by bidding lower, but the lowest 
he can bid and still win is 1, so his maximum regret in this case is v' — 1 (which occurs, 
for example, if v" — 0). On the other hand, if v" > v' or v" — v and j < i, then i does 
not win the auction. He feels regret if he could have won the auction and still paid at 
most V. To win, if j < i, he must bid v" + 1, in which case his regret is f — v" — 1. In 
this case, v" can be as small as v', so his regret is at most v — v' — 1. U j > i, then he 
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must only bid v" to win, so his regret is f — f", but v" > f ' + 1, so his regret is again 
at most V — v' — 1. It follows that bidding v' = v/2is the unique action that minimizes 
regret (yielding a regret of f/2 — 1). | 



5 Mechanism Design using Regret Minimization 

In this section, we show how using regret minimization as the solution concept can help 
to construct efficient mechanisms. We consider mechanisms where an agent truthfully 
reporting his type is the unique strategy that minimizes regret, and focus on prior- free 
mechanisms (i.e., mechanisms that do not depend on the type distribution of players); 
we call these regret-minimizing truthful mechanisms. Additionally, we focus on ex- 
post individually-rational (IR) mechanismso As the example below shows, regret- 
minimizing truthful mechanisms can do significantly better than dominant-strategy 
truthful mechanism. 

Example 5.1 Maximizing revenue in combinatorial auctions: In a combinatorial auc- 
tion, there is a set of m indivisible items that are concurrently auctioned to n bidders. 
The bidders can bid on bundles of items (and have a valuation for each such bundle); 
the auctioneer allocates the items to the bidders. The standard VCG mechanism is 
known to maximize social welfare (i.e., the allocation by the auctioneer maximizes the 
sum of the valuations of the bidders of the items they are assigned), but might yield 
poor revenue for the seller. Designing combinatorial auctions that provide good revenue 
guarantees for the seller is a recognized open problem. By using regret minimization 
as the solution concept, we can provide a straightforward solution. 

Consider a combinatorial first-price auction: that is, the auctioneer determines 
the allocation that maximizes its revenue (based on the bidders' bids), and the winning 
bidders pay what they bid. Using the same argument as for the the case of a single-item 
first-price auction, it follows that if a bidder's valuation of a bundle is v (where v is an 



^^Recall that a mechanism is ex-post individually rational if a player's utility of participating is no 
less than that of not participating, no matter what the outcome is. 
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even number), bidding f/2 is the unique bid on that bundle that minimizes his regret. 
Thus, in a combinatorial first- price auction, the seller is guaranteed to receive MSW/2, 
where MSW denotes the maximal social welfare, that is, the maximum possible sum of 
the bidders' valuation for an allocation. Clearly MSW is the most that the seller can 
receive, since a rational bidder would not bid more than his valuation. To additionally 
get a truthful auction with the same guarantee, change the mechanism so that the 
winning bidder pay b/2 if he bids b; it immediately follows that a bidder with valuation 
V for a bundle should bid v. (The mechanism is also trivially IR as players never 
pay more than their valuation.) This should be contrasted with the fact that there is 
no dominant-strategy implementation (i.e., no mechanism where bidding the valuation 
maximizes utility no matter what the other player bid) that guarantees even a positive 
fraction of MSW as revenue. In fact, as we now show, to guarantee a fraction r of MSW 
the minimum regret needs to be "large". By way of contrast, dominant strategies have 
regret 0. 

Lemma 5.2 An efficient, IR, regret-minimizing truthful mechanism that guarantees 
the seller a fraction r of MSW as revenue has a minimum regret of at least rMSW — 
1. (In particular, a mechanism that guarantees a revenue of MSW/2 must have a 
minimum regret of at MSW/2 + 1, just like the first-price auction.) 

Proof: The claim already holds if there is a single object and two buyers. Assume 
by way of contradiction that there exists an efficient, IR, truthful auction where the 
seller's revenue is at least rMSW . Since the auction is efficient and truthful, the bidder 
with the higher bid h will win the auction. It follows that this bidder must pay at least 
rh (or else either the revenue guarantee could not be satisfied, or the auction is not 
truthful), but at most h (or else the auction cannot be both truthful and IR). Thus, 
player I's regret when bidding its valuation v is at least rf — 1, since if player 2 bids 0, 
player 1 needs to pay at least rv, whereas he could have paid at most 1 by bidding 1 
(since with a truthful, IR mechanism, a player will never pay more than he bids). I 

The following result shows that no regret-minimizing truthful mechanism can do 
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significantly better than the first-price auction in terms of maximizing revenue. 

Lemma 5.3 No efficient, IR, regret-minimizing truthful mechanism can guarantee the 
seller more than {{^-l)/2)MSW of revenue. 

Proof: As in Lemma [5^ consider a mechanism for a single object case with two buyers 
that has a revenue guarantee of rMSW . We claim that if player 1 has valuation i;, then 
his regret if he bids av is at most max(af , v — rav). To see this, note that if player 2 
bids h < av, then player 1 pays at most av (by IR and truthfulness), which potentially 
could have been saved. Thus, his regret is at most av if player 2 bids less than av. If, 
on the other hand, player 2 bids b > av, then player 1 needs to pay at least rav to win 
the object (by truthfulness and the revenue guarantee), so his regret is at most v — rav. 
It is easy to see that av = v — rav if a = l/(r + 1). Thus, if a = l/(r + 1), player I's 
regret is at most v/{r + 1). (We are ignoring here the possibility that v/{r + 1) is not 
an integer, hence not a legal bid. As we shall see, this will not be a problem.) But, by 
Lemma [5.21 player I's regret when his valuation is v can be as high as rv — 1. Thus, 
we must have v/{r + 1) > rv — 1, or equivalently, (r — l/(r + 1))^; < 1. This can be 
guaranteed for all v only if r — l/(r + 1) < 0, so we must have r < (-\/5 — l)/2. | 



6 Related Work 

While the notion of regret has been well studied in the context of decision theory 
(see the [Hayashi 2008b| and the references therein for some discussion of the recent 
work). To the best of our knowledge, there was no work on applying regret to game 
theory up until very recently. In the computer science literature, Hyafil and Boutilier 
[Hyafil and Boutilier 2004] consider pre-Bayesian games, where each agent has a type 
and a player's utility depends on both the action profile and the type profile, just as in 
a Bayesian game, but now there is no probability on typescj The solution concept they 



Hyafil and Boutilier actually consider a slightly less general setting, where the utility for player i 
depends only on player i's type, not the whole type profile. Modifying their definitions to deal with 
the more general setting is straightforward. 
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use is a hybrid of Nash equihbrium and regret. Roughly speaking, they take regret with 
respect to the types of the other players, but then use Nash equilibrium with respect 
to the strategies of other players. That is, they define regret^^{ai \ a-i,t) as we do 
(taking Si to consist of all strategies for player i), but then define regret'^' {ai \ a^i) by 
minimizing over all They then define a profile a to be a minimax-regret equilibrium 
if, for all type profiles t, no agent can decrease his regret by changing his action. For 
strategic games, where there are no types (i.e., |T| = 1), their solution concept collapses 
to Nash equilibrium. Thus, their definitions differ from ours in that they take regret 
with respect to types, not with respect to the strategies of other players as we do, and 
they do not iterate the regret operation. 

Aghassi and Bertsimas [2006] also consider pre-Bayesian games, and use a solution 
concept in the spirit of that of Hyafil and Boutilier. However, rather than using minimax 
regret, they use maximin, where a maximin action is one with the best worst-case payoff, 
taken over all the types of the other agents. Just as with the Hyafil-Boutilier notion, 
the Aghassi-Bertsimas notion collapses to Nash equilibrium if there is a single type. 

By way of contrast, we assume that players minimize regret also with respect to the 
strategies of all other player. Additionally, we iterate the deletion process. 

Even closer to our work is a recent paper by Renou and Schlag |2008j . Just as 
we do, they focus on strategic games. Their motivation for considering regret, and 
the way they do it in the case of pure strategies, is identical to ours (although they 
do not iterate the deletion process). They allow prior beliefs, as in Section [3.51 and 
require that these beliefs are described by a closed, convex set of strategies. They are 
particularly interested in strategy profiles a that minimize regret for each agent with 
respect to all the strategy profiles in an e neighborhood of a. 

Note that, although they define regret for pure strategies, this is only a tool for 
dealing with mixed strategies; they do not consider the regret of a pure strategy with 
respect to a set of pure strategies, as we do, because a non-singleton set of pure strategies 
is not convex. In particular, they have no analogue to our analysis of pure strategies in 
Section [3.4.1[ If we consider regret relative to the set of all mixed strategy profiles, then 
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we are just in the setting of Section [3. 4. 2[ However, their definition of regret for mixed 
strategies is different from ours. Our definition of regretf'{ai \ a^i) does not depend on 
whether a consist of pure strategies or mixed strategies (except that expected utihty 
must be used in the case of mixed strategies, rather than utihty). By way of contrast, 
Renou and Schlag |2UU8] define 

regret\{ai \ = ^ ai{a)a^i{d^i)regretf'{ai \ a_i), 

where, as before, regretf*{ai \ a^i) denotes the regret of player i relative to the actions 
Ai. That is, regret[{ai \ is calculated much like the expected utility of a to agent i, 
in terms of the appropriate convex combination regrets for pure strategies. Note that 
regret^ is independent of any set Si. 

In general, regret^{ai \ a^i) is quite different from regretf^{ai \ a^i), as the following 
example shows. 

Example 6.1 Consider the symmetric 2-player game where Ai = Ai = {a,b,c}, and 
player I's payoffs are given by the following table: 





a 


b 


c 


a 


5 


2 


1 


b 





3 


1 


c 


3 


1 


4 



It immediately follows that regretf^{a \ a) = regretf^{b \ b) = regretf^{c | c) = 0; 
regretf^{a | 6) = 0; regretf^{b | a) = 3; regretf^{c \ a) = regretf^{c | 6) = 2; and 
regretf^{a \ c) = regretf^{b \ b) = 3. If cr = (l/6)a+(l/2)6+(l/3)c, it is easy to see that 
Mi(a, a) = 5/6 + 1 + 1/3 = 13/6; ui{b, a) = 3/2; and ui{c, a) = 1/2 + 1/2 + 4/3 = 7/3. 
Thus, c minimizes regret relative to cr, and regretf^{c | a) = 0, regretf^{a | a) = 1/6, 
and regretf^{b | cr) = 5/6. Thus, regretf^{c \ cr) < regretf^{a \ cr) < regretf^ib \ cr). 

On the other hand, regret[{a | a) = 3/2 + 2/3 = 5/3; regret\{b | a) = 1/2 + 2/3 = 
7/6; and regret[{c \ cr) = 4/3. Thus, regret[{b \ a) < regret[{c \ a) < regret[{a | cr). | 
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The reason for the difference between regret and regret' in this example is that the 
best responses to a, b, and c are all different. To understand the difference between 
the two approaches, consider an agent who is playing c, facing a population of people, 
one-sixth of whom play a, half of whom play b, and the remainder play c (so that the 
population is emulating strategy a). What is the agent's regret if he plays c against 
such a population? If he only plays once, and his opponent plays a, should he feel 
regret 2 (if he had only played a, he would have done 2 better against that opponent), 
or should he take into account, when he considers how he would have done, that he is 
playing against a randomly chosen opponent from the population, not necessarily that 
particular opponent (in which case his regret is 0). Using regret' corresponds to the 
first approach, while regret^^ corresponds to the second. 

As the following result shows, if we are considering the strategy that minimizes 
regret with respect to all strategies, then it does not matter which approach we take. 

Proposition 6.2 Let G = {[n],A,u) be a strategic game and let cXi be a mixed strategy 
for player i. Then regretf{ai) = maXo-_.gs_i regret'^{ai \ a^i). 

Proof: By Proposition 13.181 regretf\ai) = maXa:_-5=^ . regret^^{ai \ a^i). It is also easy 
follows from the definition of regret'^ that maXo-_.es_i regret'^^cTi \ cr_i) = max^_.gA_i regret'^{ai 
d-i). Thus, it suffices to show that regret^' {ai \ a^i) = regret'^{ai \ a_j) for all G v4_j. 
This is straightforward. For suppose that a' G Ai is the best response to a^; that is, 
using our earlier notation, a' = uf^ia^i). Then, by definition, 

regref^^ {(Ti \ a^i) 

= Ui{a', a^i) - Ui{ai, a^i) 

= Ui{a', d^i) = EbeA, cri{b)Ui{b, a^i) 

= EbeA, cri{b){Ui{a', ai) - Ui{b, d_i) 

= EfcGA, (Ji{b)regret'^^{b, d_i) 

= regret' {a i, a ^i) 

I 
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Proposition 16.21 shows that, for many of the examples in Section [3.4.21 it would not 
matter whether we had used regret' instead of regret. On the other hand, the difference 
between the two approaches becomes more significant if there is prior knowledge (or 
we do iterated regret minimization). While Proposition 16.21 shows that it does not 
matter how we define regret for mixed strategies if we have no prior knowledge, the 
differences between the definitions become significant if there is some prior knowledge 
(which is precisely the case focused on Renou and Schlag). It is worth nothing that 
when Renou and Schlag consider regret minimization with respect to some set 11 = 
III X ■ ■ ■ X n„ of strategy profiles, their definition is essentially our notion of generalized 
regret minimization with respect to (Si x n„i, . . . , S„ x n_„); that is, each agent i puts 
no restriction on his own strategies. For example, if agent 1 does regret minimization 
in the game of Example 16.11 with respect to the set Si x {cr}, using our definition, he 
would play c, while using regret', as suggested by Renou and Schlag, he would play h. 

7 Discussion 

The need to find solution concepts that reflect more accurately how people actually 
play games has long been recognized. This is a particularly important issue because 
the gap between "descriptive" and "normative" is particularly small in game theory. 
An action is normatively the "right" thing to do only if it is the right thing to do with 
respect to how others actually play the game; thus, a good descriptive theory is an 
essential element of a good normative theory. 

There are many examples in the literature of games where Nash equilibrium and 
its refinements do not describe what people do. We have introduced a new solution 
concept, iterated regret minimization, that, at least in some games, seem to capture 
better what people are doing than more standard solution concepts. 

The outcomes of games like the Traveler's Dilemma and the Centipede Game have 
sometimes been explained by assuming that a certain fraction of agents will be "altruis- 
tic", and play the helpful action (e.g., playing 100 in Traveler's Dilemma or cooperating 
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in the Centipede Game) (cf., [Capra, Goeree, Gomez, and Holt 1999| ). There seems to 
be some empirical truth to this assumption; for example, 10 of 45 game theorists 
that submitted pure strategies in the experiments of Becker, Carter, and Naeve |2005] 
submitted 100. With an assumption of altruism, then the strategies of many of the 
remaining players can be explained as best responses to their (essentially accurate) 
beliefs. 

Altruism may indeed be part of an accurate descriptive theory, but to use it, we 
first need to decide what the "right" action is, and also with what percentage agents 
are altruistic. We also need to explain why this percentage may depend on the degree 
of punishment (as it did in the experiments of Capra et al. jl999j . for example). It- 
erated regret minimization provides a different descriptive explanation, and has some 
normative import as well. In particular, it seems the most appealing when considering 
inexperienced, but intelligent, players that play a game for the first time. In such a 
setting, it seems unreasonable to assume that players know what strategies the other 
players are using (which is assumed by the Nash equilibrium solution concept). 

While we have illustrated some of the properties of iterated regret minimization, 
we view this paper as more of a "proof of concept". There are clearly many issues we 
have left open. We mention a few of the issues we are currently exploring here. 

• As we observed in Section 13.51 some behavior is well explained by assuming that 
agents start the regret minimization procedure with a subset of the set of all 
strategy profiles, which can be thought of as representing the strategy profiles 
that the agent is considering. But we need better motivation for where this set 
is coming from. 

• We have considered "greedy" deletion, where all strategies that do not minimize 
regret are deleted at each step. We could instead delete only a subset of such 
strategies at each step of deletion. It is well known that if we do this with iterated 
deletion of weakly dominated strategies, the final set is strongly dependent on 
the order of deletion. The same is true for regret minimization. Getting an 
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understanding of how robust the deletion process is would be of interest. 

We have focused on normal-form games and Bayesian games. It would also be 
interesting to extend regret minimization to extensive-form games. A host of new 
issues arise here, particularly because, as is well known, regret minimization is not 
time consistent (see [Hayashi 2008a for some discussion of the relevant issues). 



A natural next step would be to apply our solution concepts to mechanism design 
beyond just auctions. 



A An Epistemic Characterization Using Kripke Struc- 
tures 

Let G = {[n], A, u) be a strategic game and let S denote the full set of mixed strategies. 
We consider a Kripke structure (VF, . . . , for (G, iS), where W denotes the set of 
possible worlds and Ri is a binary relation on W partitioning W into cells. Intuitively, 
{wi,W2) € Ri means that player i cannot distinguish the worlds wi and W2- Let Pi{w) 
denote player i's cell in the world w; that is, Piiw) = {w'\{w',w) G Ri}. 

At each world w G W, there is an associated strategy profile a E S and, for each 
player i, a sequence {Bq,B[,...) of subsets of player i's cell at w. Intuitively, are 
the worlds in the cell that player i considers most likely, Bl are less likely, and so on. 
Thus, the sequence models player z's beliefs. We assume that at each world in a cell for 
player i, player i uses the same strategy and has the same beliefs. Thus, a player knows 
his own strategy and beliefs; his beliefs are only over the other players' strategies and 
beliefs. Let strat{w) denote the strategy profile associated with w. Given a set B of 
worlds, let strat{B) denote the set of strategy profiles associated with the worlds in B. 

Note that while the notion of a lexicographic belief sequence defined in Section 
13.31 considers a sequence (iSo, iSi, . . .) of strategies, a belief sequence here considers a 
sequence B = {Bq, Bi, ...) of worlds. Given such a belief sequence B = {Bq, Bi, ...), let 
stratiB) = {strat^Bo)^ strat{Bi), ...) denote the lexicographic belief sequence associated 
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with B. Player i is rational in world w if player z's strategy in w is rational with 
respect to the lexicographic belief sequence strat{B'^), where i3* denotes player i's belief 
sequence in w (and rationality with respect to lexicographic sequences is defined as in 
Section [3l3l) . 

We will be interested in Kripke structures [W, Ri, . . . Rn) that are complete; this is 
a richness assumption that is analogous to one made by Brandenburger et al. |2UU4j . 
The structure (W, Ri, . . . Rn) for (G, S) is complete if 

1. for every a & S, there exists some world w such that strat{w) = a; and 

2. for every every player i and sequence (5° j, iSi^, . . .) of sets of strategy profiles 
such that, for all j, iSij C strat{Pi{w)), there exists a possible world w' G Pi{w) 
such that i's belief sequence at w' is B\ and strat{B^) = (iS°j,iSlj, . . .). 

Let {W, Ri, . . . Rn) be a complete Kripke structure for {G,S). Define a sequence 
of worlds {Wq, W2, ■ ■ ■), where W'^ intuitively consists of all worlds where players are 
rational and have rational beliefs up until level k — 1. Thus, W'' represents worlds 
where all player use at least k levels of rationality. More formally, consider the following 
sequence. 

• Wq is the subset of W where each player i is rational. 

• Wi is the subset of Wq where each player i's level-0 belief Bq is such that 
strat^B^) = S^i] that is, each players considers all strategy profiles possible at 
the top level. (This captures the intuition that players' primary beliefs are such 
that they make no assumptions about the other players' strategies.) Let be 
the set of strategy profiles that appear in worlds in Wi. 

• ... 

• Wk is the subset of Wk-i where each player i's level-(A; — 1) belief is such that 
strat{B\_^ = 5^7^) that is, each player's level-(A; — 1) belief is that all players 
use strategies and beliefs from a world in Wk-i- (This captures the intuition that 
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players' level- (A; — 1) belief is that the other players use at least k — 1 levels of 
rationality.) Let S'^ be the set of strategy profiles that appear in worlds in Wk- 

It now easily follows by induction that lZAi^{S) = strat{Wk)- Completeness is 
required to ensure that for every world w G Wk and every player i, there exists some 
world w' G Wk such that strat{w) = strat{w'), for every player i, z's fcth-order beliefs 
in w and w' are the same (i.e., the first k — 1-level beliefs are the same in w and w'), 
and for every j > k — 1, z's level-j belief in w' is the same as z's level-(/c — 1) belief in 
w'; this ensures that = 71M{S^^^). 

We conclude that 71M.°°{S) = HkeNWh- Intuitively, this means that a strategy that 
survives iterated regret minimization is used in a world where each player is rational, 
each player's primary belief is that everyone else is using an arbitrary strategy, each 
player's secondary belief is that everyone else is rational, and so on. 

B Proofs 

We provide proofs of all the results not proved in the main text here. We repeat the 
statements for the convenience of the reader. 

Theorem 13. 3t Let G = {[n],A,u) be a strategic game. If S is a closed, nonempty 
set of strategies of the form Si x . . . x Sn, then TZM.°°{S) is nonempty, TZAi'^{S) = 
TZMTiS) X ... X TZM^iS), and nM{nM^{S)) = nM^{S). 

Proof: We start with the case of pure strategies, since it is so simple. Since Vi^S) C S 
for any deletion operator V and set S of strategy profiles, when we have have equality, 
then clearly nM°^{A) = TZM^A). Since A is finite by assumption, after some point 
we must have equality. Moreover, we have TZM{TZM'^{A)) = TZM'^{A). 

To deal with the general case, we must work a little harder. The fact that TZA4°°{S) = 
71M.'^{S) X ... X 7lAi'^[S) is straightforward and left to the reader. To prove the other 
parts, we first need the following lemma. 
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Lemma B.l Let S be a nonempty closed set of strategies. Then TZA4{S) is closed and 
nonempty. 

Proof: We start by showing that regretf' is continuous. First note that regretf^ {ai \ a_i) 
is a continuous function of a^. By the closedness (and hence compactness) of S it 
follows that regretf (ai) = max^^ regret^^{ai \ a^i) is well defined (even though 
it involves a max). To see that regretf is continuous, suppose not. This means 
that there exist some a, 5 such that for all n, there exists an a„ within 1/n of a 
such that I regretf (a„) — regretf {a)\ > 6. By compactness of S, it follows by the 
Bolzano- Weierstrass theorem |Rudin 1976] that there exist a convergent subsequence 
(a„„, regret j^{an^)) which converges to (a, 6). We have \b — regretj^{a)\ > 6, which is a 
contradiction. 

Now, to see that TZAi{S) is nonempty, it suffices to observe that, because S is 
compact, and regretf is continuous, for each player i, there must be some strategy 
such that regretf (ai) = minregretf . Thus, G 7lAi{S). 

To show that 71J^{S) is closed, suppose that (o"™)m=i,2,3,... is a sequence of mixed 
strategy profiles in TZA4{S) converging to cr (in the sense that the probability placed 
by cr™ on a pure strategy profile converges to the probability placed by a on that 
strategy profile) and, by way of contradiction, that a ^ 7lJli{S). Thus, for some player 
i, (7i ^ 7lJ^i{S). Note that, since 7lAi{S) C S, the sequence {(Jr)m=i,2,3,... is in S; 
since S is closed, a E S. Let minregretf = b. Since a™ G 7lJ^i{S), we must have 
that regretf (a^^) = b for all m. Since ^ TZAii{S), there must exist some strategy 
profile T E S such that Ui{T) — Ui{ai,T-i) = b' > b. But by the continuity of utility, 
'^Ti'^') ~ ^' ■ This contradicts the assumption that regretf [cr^) — b for all m. 
Thus, nM{S) must be closed. I 

Returning to the proof of Proposition 13. 3[ note that since S is closed and nonempty, 
it follows by Lemma [B. II that TZA4'^{S) is a closed nonempty set for all k. Additionally, 
note that TZM^iS) can be viewed as a subset of the compact set [0,1]' ' (since a 
probability distribution on a finite set X can be identified with a tuple of numbers in 
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[0, 1]'"^'); it follows that TZAi^{S) is also bounded, and thus compact. Finally, note that 
the set {TZAi^{S) : /c = 1, 2, 3, . . .} has the finite intersection property: the intersection 
of any finite collection of its elements is nonempty (since it is equal to the smallest 
element). The compactness of S now guarantees that the intersection of all the sets 
in a collection of closed subsets of S with the finite intersection property is nonempty 
[Munkres 2000j . In particular, it follows that TZAi°°{S) is nonempty. 

To see that 7?.7W°°(5) is a fixed point of the deletion process, suppose, by way of 
contradiction, that ai e TZM^iS) - UMiiJlM'^ {S)) . Let mznregretf^°°^^^ = b and 
choose a'i G UMnS) such that regretf^^'~^\al) = b. Since i nM{nM°°{S)), 
it must be the case that regret^ (a^) = b' > b. By assumption, ai G 71A4^{S), 

so ai G nM\S) for all k; moreover, regretf^' ^^\ai) > b'. Since ai G TLM'^^^S), 
it follows that minregretf^ ^'^^ > b' . This means that there exists a strategy profile 
t'' G TZM^ such that f/i(f'') - Ui{ai,f':i) > b'. By the Bolzano- Weierstrass theorem, 
the sequence of strategies {f^)k=i,2,... has a convergent subsequence {t^^) j=i,2,... that 
converges to some strategy profile f. Since G TZ}A^{S) for all A; > m, it must be 
the case that, except for possibly a finite initial segment, this convergent subsequence 
is in 1ZM."^{S). Since 1ZM.'^{S) is closed, r, the limit of the convergent subsequence, 
is in TZM.^{S) for all m > 1. Thus, r G TZM°°{S). Now a simple continuity argument 
shows that f/i(r) — Ui{a'-,T_i) >b'>b,a contradiction. | 

Lemma I3.13t regretf{sad) = — 1)('U3 — ^2) + max(— mi, ^2 ~ Moreover, if s is 
a strategy for player 1 where he plays c before seeing player 2 play c (i.e., where player 
1 either starts out playing c or plays c at the kth for k > 1 move after seeing player 2 
play d for the first k — 1 moves), then regret f{s) > {n — l){u3 — U2)+Taax{—Ui,U2 — U3). 

Proof: Let Sc be the strategy where player 2 starts out playing d and then plays c to 
the end of the game if player 1 plays c, and plays d to the end of the game if player 1 
plays d. We have regretf^{sad I Sc) = (n — 1)(m3 — Mi) — Ui. player 1 gets gets nui with 
{sad, Sc), and could have gotten {n — 1)^3 if he had cooperated on the first move and 
then always defected. 

Let s[ be the strategy where player 2 starts out playing c and then plays c to the 

45 



end of the game if player 1 plays c, and plays d to the end of the game if player 1 
plays d. It is easy to see that regret'f^ {s ad \ s[) = {n — l)(u3 — ui) + {u2 — u^). Thus, 
regretf{sad) > {n — 1)(m3 — U2) + max(— Mi, M2 — M3). We now show that regretf{sad) = 
{n — 1)(m3 — U2) + max(— ui, U2 — U3). For suppose that the regret is maximized if player 
2 plays some strategy s, and player I's best response to s is s'. Consider the first place 
where the play of {s',s) differs from that of {sad,s). This must happen after a move 
where player 1 plays c with s'. For as long as player 1 plays d with s', player 2 cannot 
distinguish s' from Sad, and so does the same thing in response. So suppose that player 

1 plays c at move k with s'. If player 2 plays c at step k, player 1 gets a payoff of M3 
with {sad, s) at step k and a payoff of U2 with (s', s). Thus, player I's total payoff with 
{sad, s) is at least {n — l)ui + U3, while his payoff with {sad, s) is at most {n — 1)^3 + U2; 
thus, his regret is at most {n — l){u^ — Ui) + {u2 — u-^)- On the other hand, if player 

2 plays d with s at step k, then player I's payoff at step k with (s', s) is 0, while his 
payoff at step k with {sad, s) is Ui. Thus, his regret is at most (n — 1)(m3 — ui) — ui. 
(In both cases, the regret can be that high only if = 1.) 

We next show that if s is a strategy for player 1 where he plays c before seeing 
player 2 play c, then regretf{s) > (n — l)(u3 — U2) + max(— mi,U2 — U3). Suppose 
that k is the first move where player 1 plays c despite not having seen c before. If 
k = 1 (so that player 1 cooperates on the first move), let Sd be the strategy where 
player 2 plays d for the first move, then plays c to the end of the game if player 1 
has played d for the first move, and otherwise plays d to the end of the game. It is 
easy to see that Then regretf^{s \ Sd) = [n — 1)('U3 — Ui) + Ui. On the other hand, 
if /c > 1, then the regret regret'f^{s \ s^) > {n — l)(w3 — Ui). Thus, regretf(s) > 
{n - l)(n3 - U2) + max(-Mi,M2 - u^). I 

Proposition I3.18t Let G = {[n],A,u) be a strategic game and let cTj be a mixed 
strategy for player i. Then regretf{ai) = maxs_-g^_. regret"^' {ai \ a^i). 

Proof: Note that, for all strategies profiles (T„j, there exists some strategy o"j such that 
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Uf{a^i) = Ui{a). It follows that 



Thus 



regret^ {(Xi \ a_i) 

= Ea_,eA_, a^i{a^i) regret^ {ai \ d-i) 
< maXa:_^gA_, regret^ {ai \ a^i). 



It follows that 



regret^ (cTj) = ^max regret {ai \ a^i) = ^max regret {at \ d-i). I 



Lemma I3.23t regret"^ {a) < 3. 

Proof: By Proposition l3.18l to compute regretf{a), it suffices to compute regretf^{a \ a) 
for each action a of player 2. If Player 1 plays a and player 2 player plays 100, then 
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the best response for player 1 is 99, giving him a payoff of 99. The payoff with a is 



100 X 1/2 + 101 X 1/4+ 100 X 1/8 + ■■■ + 5 X 2"^^ + 4 x 2-^8 

102 X 1/2 + 101 x 1/4 + 100 x 1/8 + h 5 x 2'^^ + 4 x 2'^^ - 1 

= 4 x (1/2 + 1/4 + ••• + 2-98 + 2-98)+ 

1 X (1/2 + 1/4 H h2-98)+ 

1 X (1/2 + 1/4 + • • • + 2-97) ^ ... - 1 
= 4 + (1 - 2-98 + (1 _ 2-97) + + 1/2) - 1 

= 102 - (1/2 + 1/4+ h2-98)-l 

= 100 + 1/298, 

so the regret is less than 1. Similarly, if player 2 plays k with 2 < /c < 99, the best 
response for player 1 is /c — 1, which would give player 1 a payoff of /c + 1, while the 
payoff from a is 

(A; - 2)(l/2 + • • • + 1/2100-^) + A; x l/2ioi-'= + (A; + 1) x l/2i02-fe ^ ^ ^ l/2i03-fe^ 
(A; - 1) X 1/2^04-'= + • • • + 5 X 1/298 + 4 ^ 1/208. 

Thus, player I's regret if player 2 plays k is 

3 X (1/2 + • • • + 1/2100-'= + l/2i05-fe) + 2 X l/2io4-fc + 1 X (1/2101-'= + 1/2^03-^) 
+ l/2i06-A^(4 + 5 X 1/2 + 6X l/4 + ^-- + (A;-4) X 1/2^8) 

= 3 X (1/2 + • • • + 1/2100-'= + l/2i05-fc)2 X l/2i04-fc + 1 x (l/2ioi-^ + l/2i03-'=) + 6 x 1/2106-^) 
= 3 X (1/2 + • • • + 1/2100-'= + l/2i04-fe)2 X 1/2104-'= + 1 x (l/2ioi-*^ + 1/210^-^) 
~ 3 X (1-1/2101-'=) < 3. 
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